The Cost of Recovery in Message Logging Protocols

نویسندگان

  • Sriram Rao
  • Lorenzo Alvisi
  • Harrick M. Vin
چکیده

ÐPast research in message logging has focused on studying the relative overhead imposed by pessimistic, optimistic, and causal protocols during failure-free executions. In this paper, we give the first experimental evaluation of the performance of these protocols during recovery. Our results suggest that applications face a complex trade-off when choosing a message logging protocol for fault tolerance. On the one hand, optimistic protocols can provide fast failure-free execution and good performance during recovery, but are complex to implement and can create orphan processes. On the other hand, orphan-free protocols either risk being slow during recovery, e.g., sender-based pessimistic and causal protocols, or incur a substantial overhead during failure-free execution, e.g., receiver-based pessimistic protocols. To address this trade-off, we propose hybrid logging protocols, a new class of orphan-free protocols. We show that hybrid protocols perform within two percent of causal logging during failure-free execution and within two percent of receiver-based logging during recovery. Index TermsÐDistributed computing, fault tolerance, log-based rollback recovery, pessimistic protocols, optimistic protocols, causal protocols, hybrid protocols.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols

With the number of computing elements spiraling to hundred of thousands in modern HPC systems, failures are common events. Few applications are nevertheless fault tolerant; most are in need for a seamless recovery framework. Among the automatic fault tolerant techniques proposed for MPI, message logging is preferable for its scalable recovery. The major challenge for message logging protocols i...

متن کامل

Performance Evaluation of Consistent Recovery Protocols Using MPICH-GF

This paper presents an implementation of several consistent recovery protocols at the abstract device level and their performance comparison We have performed experiments using three NAS Parallel Benchmark applications with class C datasets on state of the art equip ment The interesting result is that causal message logging protocol has the most expensive recovery cost with communication intens...

متن کامل

The Relative Overhead of Piggybacking in Causal Message Logging Protocols

Message logging protocols ensure that crashed processes make the same choices when re-executing nondeterministic events during recovery. Causal message logging protocols achieve this by piggybacking the results of these choices (called determinants) on the ambient message traffic. By doing so, these protocols do not create orphan processes nor introduce blocking in failure-free executions. To s...

متن کامل

On the Use of Cluster-Based Partial Message Logging to Improve Fault Tolerance for MPI HPC Applications

Fault tolerance is becoming a major concern in HPC systems. The two traditional approaches for message passing applications, coordinated checkpointing and message logging, have severe scalability issues. Coordinated checkpointing protocols make all processes roll back after a failure. Message logging protocols log a huge amount of data and can induce an overhead on communication performance. Hi...

متن کامل

Message Logging: Pessimistic, Optimistic, Causal, and Optimal

Message-logging protocols are an integral part of a popular technique for implementing processes that can recover from crash failures. All message-logging protocols require that, when recovery is complete, there be no orphan processes , which are surviving processes whose states are inconsistent with the recovered state of a crashed process. We give a precise speci cation of the consistency pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998